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Desmond S. Lun, Niranjan Ratnakar, Muriel Medard, Ralf Koetter, 
David R. Karger, Tracey Ho, Ebad Ahmed, and Fang Zhao 


Abstract — We consider the problem of establishing minimum- 
cost multicast connections over coded packet networks, i.e. 
packet networks where the contents of outgoing packets are 
arbitrary, causal functions of the contents of received packets. We 
consider both wireline and wireless packet networks as well as 
both static multicast (where membership of the multicast group 
remains constant for the duration of the connection) and dynamic 
multicast (where membership of the multicast group changes in 
time, with nodes joining and leaving the group). 

For static multicast, we reduce the problem to a polynomial- 
time solvable optimization problem, and we present decentralized 
algorithms for solving it. These algorithms, when coupled with 
existing decentralized schemes for constructing network codes, 
yield a fully decentralized approach for achieving minimum- 
cost multicast. By contrast, establishing minimum-cost static 
multicast connections over routed packet networks is a very 
difficult problem even using centralized computation, except in 
the special cases of unicast and broadcast connections. 

For dynamic multicast, we reduce the problem to a dynamic 
programming problem and apply the theory of dynamic pro¬ 
gramming to suggest how it may be solved. 

Index Terms —Ad hoc networks, communication networks, 
distributed algorithms, dynamic multicast groups, multicast, 
network coding, network optimization, wireless networks 


I. Introduction 

A typical node in today’s packet networks is capable of two 
functions: forwarding (i.e. copying an incoming packet onto 
an outgoing link) and replicating (i.e. copying an incoming 
packet onto several outgoing links). But there is no intrinsic 
reason why we must assume these are the only functions ever 
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permitted to nodes and, in application-level overlay networks 
and multi-hop wireless networks, for example, allowing nodes 
to have a wider variety of functions makes sense. We therefore 
consider packet networks where the contents of outgoing 
packets are arbitrary, causal functions of the contents of 
received packets, and we call such networks coded packet 
networks. 

Coded packet networks were put forward by Ahlswede et al. 
[1], and numerous subsequent papers, e.g., [2], [3], [4], [5], [6], 
have built upon their work. These papers, however, all assume 
the availability of dedicated network resources, and scant 
attention is paid to the problem of determining the allocation 
of network resources to dedicate to a particular connection 
or set of connections. This is the problem we tackle. More 
precisely, we aim to find minimum-cost subgraphs that allow 
given multicast connections to be established (with appropriate 
coding) over coded packet networks. 

The analogous problem for routed packet networks is old 
and difficult. It dates to the 1980s and, in the simplest case— 
that of static multicast in wireline networks with linear cost— 
it equates to the Steiner tree problem, which is well-known 
to be NP-complete [7], [8]. The emphasis, therefore, has 
been on heuristic methods. These methods include heuristics 
for the Steiner tree problem on undirected (e.g., [7], [9], 
[8]) and directed (e.g., [10], [11], [12]) graphs, for multicast 
tree generation in wireless networks (e.g. [13]), and for the 
dynamic or on-line Steiner tree problem (e.g., [8], [14], [15]). 
Finding minimum-cost subgraphs in coded packet networks, 
however, is much easier and as we shall see, in many cases, 
we are able to find optimal subgraphs in polynomial time us¬ 
ing decentralized computation. Moreover, since coded packet 
networks are less constrained than routed ones, the minimum 
cost for a given connection is generally less. 

In our problem, we take given multicast connections and 
thus include unicast and broadcast connections as special 
cases. But we do not consider optimizing the subgraph for 
multiple connections taking place simultaneously. One reason 
for this is that coding for multiple connections is a very 
difficult problem—one that, in fact, remains currently open 
with only cumbersome bounds on the asymptotic capability 
of coding [16] and examples that demonstrate the insuffi¬ 
ciency of various classes of linear codes [17], [18], [19], 
[20]. An obvious, but sub-optimal, approach to coding is to 
code for each connection separately, which is referred to as 
superposition coding [21]. When using superposition coding, 
finding minimum-cost allocations for multiple connections 
means extending the approach for single connections (namely. 
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the approach taken in this paper) in a straightforward way 
that is completely analogous to the extension that needs 
to be done for traditional routed packet networks, and this 
problem of minimum-cost allocations for multiple connections 
using superposition coding is addressed in [22]. An alternative 
approach to coding that outperforms superposition coding, but 
that remains sub-optimal, is discussed in [23]. 

We choose here to restrict our attention to single connec¬ 
tions because the subgraph selection problem is simpler and 
because minimum-cost single connections are interesting in 
their own right: Whenever each multicast group has a selfish 
cost objective, or when the network sets link weights to meet 
its objective or enforce certain policies and each multicast 
group is subject to a minimum-weight objective, we wish to 
set up single multicast connections at minimum cost. 

Finally, we mention that a related problem to subgraph se¬ 
lection, that of throughput maximization, is studied for coded 
networks in [24], [25] and that an alternative formulation 
of the subgraph selection problem for coded wireless packet 
networks is given in [26]. 

The body of this paper is composed of four sections: Sec¬ 
tions |nl and ||nl deal with static multicast (where membership 
of the multicast group remains constant for the duration of 
the connection) for wireline and wireless packet networks, 
respectively; Section EYl gives a comparison of the proposed 
techniques for static multicast with techniques in routed packet 
networks; and Section^deals with dynamic multicast (where 
membership of the multicast group changes in time, with 
nodes joining and leaving the group). We conclude in Sec¬ 
tion EH and, in so doing, we give a sampling of the avenues 
for future investigation that our work opens up. 


II. Wireline packet networks 


We represent the network with a directed graph Q = 
(A/”, v4), where Af is the set of nodes and A is the set of 
arcs. Each arc ( 2 ,j) represents a lossless point-to-point link 
from node i to node j. We denote by Zij the rate at which 
coded packets are injected into arc ( 2 ,j). The rate vector z, 
consisting of Zij, {i,j) G A, is called a subgraph, and we 
assume that it must lie within a constraint set Z for, if not, 
the packet queues associated with one or more arcs becomes 
unstable. We reasonably assume that Z is a convex subset of 
the positive orthant containing the origin. We associate with 
the network a cost function / (reflecting, for example, the 
average latency or energy consumption) that maps valid rate 
vectors to real numbers and that we seek to minimize. 

Suppose we have a source node s wishing to transmit 
packets at a positive, real rate i? to a non-empty set of sink 
nodes T. Consider the following optimization problem: 


minimize 

/(^) 




subject to 

z ^ 





— ^\j — 0, 

V 

ifj) G A, t GT. 

E 

Xi) 

■^ij 

E 

Jt) 

■^jt 







y iGN,tGT, 


( 1 ) 


where 

(R if i = s, 



I 0 otherwise. 

Theorem 1: The vector z is part of a feasible solution 
for the optimization problem ([0 if and only if there exists 
a network code that sets up a multicast connection in the 
wireline network represented by graph Q at rate arbitrarily 
close to R from source s to sinks in the set T and that injects 
packets at rate arbitrarily close to Zij on each arc ( 2 ,j). 

Proof: First suppose that 2 ; is part of a feasible solution 
for the problem. Then, for any t in T, we see that the maximum 
flow from s to f in the network where each arc ( 2 ,j) has 
maximum input rate Zij is at least R. So, by Theorem 1 of [1], 
a coding solution that injects packets at rate arbitrarily close to 
Zij on each arc {i,j) exists. Conversely, suppose that we have 
a coding solution that injects packets at rate arbitrarily close 
to Zij on each arc (i, j). Then the maximum input rate of each 
arc must be at least Zij and moreover, again by Theorem 1 of 
[1], flows of size R exist from s to f for each t in T. Therefore 
the vector z is part of a feasible solution for the optimization 
problem. ■ 

From Theorem E it follows immediately that optimization 
problem ([0 finds the optimal cost for an asymptotically- 
achievable, rate-ii multicast connection from s to T. 

As an example, consider the network depicted in Fig- 
ure |l(a)| We wish to achieve multicast of unit rate to two sinks, 
ti and t 2 . We have Z = [0, and f{z) = 
where is the cost per unit rate shown beside each link. 
An optimal solution to problem m for this network is shown 
in Figure |l(b)| We have flows, and of unit size 
from s to ti and t 2 , respectively and, for each arc 
Zij = inax{x[j\ as we expect from the optimization. 

To achieve the optimal cost, we code over the subgraph z. A 
code of length 2 for the subgraph is given in [1, Figure 7], 
which we reproduce in Figure PTc}| In the figure, Xi and X 2 
refer to the two packets in a coding block. The coding that is 
performed is that one of the interior nodes receives both Xi 
and X 2 and forms the binary sum of the two, outputting the 
packet Xi -f X 2 . The code allows both ti and t 2 to recover 
both Xi and X 2 and it achieves a cost of 19/2. 

Given a solution of problem Q, there are various coding 
schemes that can be used to realize the connection. The 
schemes described in [27], [6] operate continuously, with each 
node continually sending out packets as causal functions of 
received packets. The schemes described in [1], [2], [3], [4], 
[5], on the other hand, operate in a block-by-block manner, 
with each node sending out a block of packets as a function of 
its received block. In the latter case, the delay incurred by each 
arc’s block is upper bounded hy 5/R for some non-negative 
integer 5 provided that ZijjR G 'Ll5 for all {i,j) G A. We 
unfortunately cannot place such constraints into problem Q 
since they would make it prohibitively difficult. An alternative 
is, given z, to take \5z/R\R/5 as the subgraph instead. Since 
\5z/R^R/d < {5z/R + f)R/5 = z + R/5, we can guarantee 
that \5z/R\Rl5 lies in the constraint set Z by looking at 
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(c) Each arc is marked with its code. 
Fig. 1. A network with multicast from s to T = {fi, ^ 2 }- 


z + R /5 instead of z, resulting in the optimization problem 
minimize f{z + R/S) 
subject to z + R/S € Z, 


> 0 , 

2.(‘) 


ihj) & -A, t G T, 




( 2 ) 


Z 4 “- E 

'i iGN,tGT. 

We see that, by suitable redefinition of / and Z, problem (|3 
can be reduced to problem m. Hence, in the remainder of the 
paper, we focus only on problem 0 . 

A. Linear, separable cost and separable constraints 

The case of linear, separable cost and separable constraints 
addresses scenarios where a fixed cost (e.g., monetary cost, 
energy cost, or imaginary weight cost) is paid per unit rate 
placed on an arc and each arc is subject to a separate constraint 
(the closed interval from 0 to some non-negative capacity). 
This is the case in the network depicted in Figure |l(a)| So, 
with each arc we associate non-negative numbers 

and Cij, which are the cost per unit rate and the capacity of the 
arc, respectively. Hence, the optimization problem Q becomes 
the following linear optimization problem. 

minimize a^- Zy 

subject to dj > Zij, V {i,j) G A, 

V(t,j)G4iGT, (3) 


Zij > x[f > 0 , 




Jt) _ 

Xij 


E 


(t) (i) 

t \ ■ = rr- 


y iGN,tGT. 

Unfortunately, the linear optimization problem 0 as it 
stands requires centralized computation with full knowledge 
of the network. Motivated by successful network algorithms 
such as distributed Bellman-Ford [28, Section 5.2], we seek 
a decentralized method for solving problem 0, which, when 
married with decentralized schemes for constructing network 
codes [5], [6], [27], results in a fully decentralized approach 
for achieving minimum-cost multicast in the case of linear, 
separable cost and separable constraints. 

Toward the end of developing such an algorithm, we con¬ 
sider the Lagrangian dual problem 


maximize 

t£T 

subject to = aij, V {i,j) G A, 

V (ij) GA,tGT, 


tGT 


f!? > 0, 


where 




mm 


V 

{i,3)eA 


Pij Xij , 


(4) 


(5) 


and is the bounded polyhedron of points x^^'l satisfying 
the conservation of flow constraints 


(i) 


E -"u 


E 

{jlUA^A} 


(t) (t) 


y iGAf, 
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and capacity constraints 

0<x\f<Cij, W{i,j)GA. 

Subproblem (0 is a standard linear minimum-cost flow 
problem, which can be solved using a multitude of different 
methods (see, for example, [29, Chapters 4-7] or [30, Chapters 
9-11]); in particular, it can be solved in an asynchronous, 
distributed manner using the e-relaxation method [31, Sections 
5.3 and 6.5]. In addition, if the connection rate is small 
compared to the arc capacities (more precisely, if i? < aj for 
all {i,j) S A), then subproblem 0 reduces to a shortest path 
problem, which admits a simple, asynchronous, distributed 
solution [28, Section 5.2]. 

Now, to solve the dual problem 0 , we employ subgradient 
optimization (see, for example, [32, Section 6.3.1] or [33, 
Section 1.2.4]). We start with an iterate p[0] in the feasible 
set of 0 and, given an iterate p[n] for some non-negative 
integer n, we solve subproblem for each f in T to obtain 
x[n]. We then assign 

Pij[n+1] := argminy^(u*^*^ - {p[f [n] + 9[n]x[f [n]))'^ (6) 

vePij 

for each {i,j) C A, where Py is the |T|-dimensional simplex 


P — 


= Uij, V > 0 

teT 


and 6[n] > 0 is an appropriate step size. Thus, Pij[n + 1] is 
set to be the Euclidean projection of Pij[n] + 6[n]xij[n] onto 
P 

To perform the projection, we use the following algorithm, 
the justification of which we defer to Appendix |I] Let u := 
Pij\n] + d[n]xij[n] and suppose we index the elements of T 
such that > ... > Take k to be the 

smallest k such that 


1 

k 



tig 




or set k = \T\ if no such k exists. Then the projection is 
achieved by 









if t G 
otherwise. 


The disadvantage of subgradient optimization is that, whilst 
it yields good approximations of the optimal value of the 
Lagrangian dual problem 0 after sufficient iteration, it does 
not necessarily yield a primal optimal solution. There are, 
however, methods for recovering primal solutions in subgra¬ 
dient optimization. We employ the following method, which 
is due to Sherali and Choi [34]. 

Let {pi[n]}i=i^,,,^n be a sequence of convex combination 
weights for each non-negative integer n, i.e. Mi W = 1 

and pi[n] > 0 for all I = 1,... ,n. Lurther, let us define 




and 

A7n“ := , max {-jin - 7(/-i)n}- 

If the step sizes {0[n]} and convex combination weights 
{/i;[n]} are chosen such that 

1) 7in > 7(i-i)n for alU = 2,..., n and n = 0,1,..., 

2) A 7 ““ ^ 0 as n ^ oo, and 

3) 7in ^ 0 as n ^ oo and 7 „„ < S for all n = 0,1,... 
for some (5 > 0, 

then we obtain an optimal solution to the primal problem 0 
from any accumulation point of the sequence of primal iterates 
{i[n]} given by 

n 

x[n]-.= '^p,i[n]x[l], n = 0,1,.... (7) 

i=i 

We justify this primal recovery method in Appendix U 

The required conditions on the step sizes and convex 
combination weights are satisfied by the following choices [34, 
Corollaries 2-4]: 

1) step sizes {0[n]} such that 9[n] > 0, lim„^o^N = 

0, = oo, and convex combination weights 

{pi[n]} given by p.i[n] = 9[l]/9[k] for all I = 
1,..., n, n = 0,1,...; 

2) step sizes given by 9[n] = a/{b -f cn) for all 

n = 0,1,..., where a > 0, b > 0 and c > 0, and convex 
combination weights {pi[n]} given by pi[n] = 1/n for 
all I = 1 ,... ,n, n = 0,1,...; and 

3) step sizes {0[n]} given by 6*[n] = n~°‘ for all n = 
0,1,..., where 0 < a < 1, and convex combination 
weights {pi[n]} given by pi[n] = 1/n for all I = 
1,..., n, n = 0,1,.... 

Moreover, for all three choices, we have p,i[n + l]/^/[n] 
independent of I for all n, so primal iterates can be computed 
iteratively using 

n 

x[n] = [n]x[l] 

1^1 
n—1 

= ^i[n]x[l] + iJin[n]x[n] 

1^1 

= (/[n — l]x[n — 1] + /r„[n]a;[n], 

where (/[n] := pi[n + l]/p.i[n]. 

We now have a relatively simple algorithm for computing 
optimal feasible solutions to problem 0 in a decentralized 
manner, with computation taking place at each node, which 
needs only to be aware of the capacities and costs of its 
incoming and outgoing arcs. Lor example, for all arcs {i,j) 
in A, we can set = aij/\T\ at both nodes i and j. 

Since each node has the capacities and costs of its incoming 
and outgoing arcs for subproblem 0 for each f G T, we can 
apply the e-relaxation method to obtain flows x^*''^ [0] for each 
t € T, which we use to compute Pij[l] and at both 

nodes i and j using equations 0 and 0, respectively. We 
then re-apply the e-relaxation method and so on. 

Although the decentralized algorithm that we have just dis¬ 
cussed could perhaps be extended to convex cost functions (by 
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modifying the dual problem and employing the e-relaxation 
method for convex cost network flow problems [35], [36]), a 
signihcantly more direct and natural method is possible, which 
we proceed to present. 


B. Convex, separable cost and separable constraints 


Let us now consider the case where, rather than a cost 
per unit rate for each arc, we have a convex, monotonically 
increasing cost function for arc (i, j). Such cost functions 
arise naturally when the cost is, e.g., latency or congestion. 
The optimization problem Q becomes the following convex 
optimization problem. 


minimize ^ fijizij) 

subject to Zij > >0, V {i,j) G A, t gT, 

E 4 ’- E 41 = A. 


y iGN,tGT. 


Note that the capacity constraints have been removed, since 
they can be enforced by making arcs arbitrarily costly as 
their flows approach their respective capacities. We again seek 
a decentralized method for solving the subgraph selection 
problem. 

We note that Zij = maxjgT xf^ at an optimal solution of 

problem (Si and that (maxigr xf ^^) is a convex function 
of Xij since a monotonically increasing, convex function of a 
convex function is convex. Hence it follows that problem (S 
can be restated as the following convex optimization problem. 


minimize ^ fiji^ij) 
subject to Zij = m&^xfj , V {i,j) G A, 


tGT 


E -if- E 


(i) (*) 

=cri , 


y iGN,tGT, 


4 ?> 0 , y {i,j)GA,tGT. 


(9) 


Unfortunately, the max function is not everywhere differ¬ 
entiable, and this can pose problems for algorithm design. 
We therefore solve the following modification of problem 
(S where the max norm is replaced by an P-norm. This 
replacement was originally proposed in [37]. 


l/n 


minimize ^ f^j{z[j) 
subject to z'j = ( 

\tGT 

E -i?- E 

4‘Eo, y {i,j)GA,tGT. 


V (uj) G A, 


(*) (*) 
Xji =<^i ^ 


( 10 ) 


yiGN,tGT, 


We have that zL > Zij for all n > 0 and that zL approaches 
Zij as n approaches infinity. Thus, we shall assume that n is 
large and attempt to develop a decentralized algorithm to solve 
problem lllOt . Note that, since zL > Zij, a code with rate zL 
on each arc {i,j) exists for any feasible solution. 

Problem dlOt is a convex multicommodity flow problem. 
There are many algorithms for convex multicommodity flow 
problems (see [38] for a survey), some of which (e.g. the 
algorithms in [39], [40]) are well-suited for decentralized 
implementation. These algorithms can certainly be used, but, 
in this paper, we propose solving problem ( I10> using a primal- 
dual algorithm derived from the primal-dual approach to 
internet congestion control (see [41, Section 3.4]). 

We restrict ourselves to the case where {fij} are strictly 
convex. Since the variable zL is a strictly convex func¬ 
tion of Xij, it follows that the objective function for 
problem (US is strictly convex, so the problem admits a 
unique solution for any integer n > 0. Let U{x) := 

let {y)+ for a; > 0 

denote the following function of y: 


iy)t = 


y if a; > 0, 

max{j/, 0} if a; < 0. 


Consider the following continuous-time primal-dual algorithm: 


At) _ 

^ij - Aj {X^j ) 


dt)(At)^ 




dx. 


(t) 

A 




^fj = mlf 


(Al?) ( 


where 


y. 


(*) . 

E 


it) (t) 

--Pl -P) ^ 


it) 

a/ 


{j\4,j)eA} 


E 


At) 

■^ji I 


( 11 ) 

( 12 ) 

(13) 


and kf^{xfj) > 0, > 0, and rnfMxlf) > 0 are 

non-decreasing continuous functions of xl*}, and 

tP tj X l IJ 

respectively. 

Proposition 1: The algorithm specified by Equations (EJ- 
(d is globally, asymptotically stable. 

Proof: See Appendix |II| ■ 

The global, asymptotic stability of the algorithm implies 
that no matter what the initial choice of {x,p) is, the primal- 
dual algorithm will converge to the unique solution of problem 
dipt . We have to choose A, however, with non-negative entries 
as the initial choice. 

We associate a processor with each arc {i,j) and node i. In 
a typical setting where there is one processor at every node, we 
could assign the processor at a node to be its own processor 
as well as the processor for all its outgoing arcs. 

We assume that the processor for node i keeps track of the 
variables while the processor for arc (t,j) keeps 

track of the variables {A|*^}tgT and {xfj}t^T- With this 
assumption, the algorithm is decentralized in the following 
sense: 
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> a node processor needs only to exchange information with 
the processors for arcs coming in or out of the node; and 

> an arc processor needs only to exchange information with 
the processors for nodes that it is connected to. 

This fact is evident from equations O-O by noting that 


dU{x) 


-fijiz'ij) 


In implementing the primal-dual algorithm, we must bear 
the following points in mind. 

> The primal-dual algorithm in (CHi-O is a continuous 
time algorithm. To discretize the algorithm, we consider 
time steps m = 1,2,... and replace the derivatives by 
differences: 


x[f[m+l] = xf^lm] 


+ alf[m] 


f dU{x[m]) 

\ dxfMm] 


N + \ 7 N 


(t)r 


pf ^ [to -I- 1] = pf'' [to] -I- [to] (p-'-' [to] - cr-'''), 

Xlf [to -f 1] = Xlf [to] -f 7^]^ [to] (^-x[f [to]) , 


n(*) [ 


?(‘)r 


,(‘)r 




where 


qlf [to] := pf ^ [to] - [to] , 


Mj L"*-! • t'l L"‘'J t'j 


and q;-*^[to] > 0, /3.*^[to] > 0, and Jiflm] > 0 can be 
thought of as step sizes. 

• While the algorithm is guaranteed to converge to the 
optimum solution, the value of the variables at any time 
instant to is not necessarily a feasible solution. A start-up 
time is required before a feasible solution is computed. 

• Unfortunately, the above algorithm is a synchronous 
algorithm where the various processors need to exchange 
information at regular intervals. It is an interesting prob¬ 
lem to investigate an asynchronous implementation of the 
primal-dual algorithm. 


C. Elastic rate demand 


follows: 

maximize U{x,R) 
subject to 

E E = 

{il0.t)6^} (14) 

V I e iV \ {f}, f G T, 

i? > 0, 

4Eo, v(z,j)G4iGr, 

where U{x,R) := (7r(i?) - E 7 j)e.A 
In problem (Gl, some of the flow constraints have been 
dropped by making the observation that the equality con¬ 
straints at a sink t, namely 

E 4?- E 4’ = 4‘> = -«. 


follow from the constraints at the source and at the other nodes. 
The dropping of these constraints is crucial to the proof that 
the algorithm presented in the sequel is decentralized. 

This problem can be solved by the following primal-dual 
algorithm. 


where 


(0 _ I 9U{x,R) _ (t) At) 

(t) 




dx\j 


R = kfi{R) 


dU{x, R) 


— qR + Xh 


dR 

P'P = hf\pf)yf'^ 

-x^\ 


- TO^‘^ 




ij I 
+ 


Xr = rnR{XR) i-R)xj^ 


(t) (t) it) 

<iij ■=Pi -pY 


QR 


■.= -Yp^\ 


teT 




E - E 

{j\ii,3)eA} {j\U,i)eA} 


At) it) 
X A — a\ . 


It can be shown using similar arguments as those for Propo¬ 
sition ^ that this algorithm is globally, asymptotically stable. 

In addition, by letting the source s keep track of the rate 
R, it can be seen that the algorithm is decentralized. 


We have thus far focused on the case of an inelastic rate 
demand, which is presumably provided by a separate flow 
control algorithm. But this flow control does not necessarily 
need to be done separately. Thus, we now suppose that the 
rate demand is elastic and that it is represented by a utility 
function that has the same units as the cost function, and we 
seek to maximize utility minus cost. We continue to assume 
strictly convex, separable cost and separable constraints. 

We associate with the source a utility function Ur such that 
Ur{R) is the utility derived by the source when R is the data 
rate. The function Ur is assumed to be a strictly concave and 
increasing. Hence, in this setup, the problem we address is as 


III. Wireless packet networks 
To model wireless packet networks, we take the model for 
wireline packet networks and include the effect of two new 
factors: link lossiness and link broadcast. Link lossiness refers 
to the dropping or loss of packets as they are transmitted over 
a link; and link broadcast refers to how links, rather than 
necessarily being point-to-point, may originate from a single 
node and reach more than one other node. Our model includes 
networks consisting of lossy point-to-point links and networks 
consisting of lossless broadcast links as special cases. 

We represent the network with a directed hypergraph l-L = 
{Af, A), where Af is the set of nodes and A is the set of 
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hyperarcs. A hypergraph is a generalization of a graph, where, 
rather than arcs, we have hyperarcs. A hyperarc is a pair (i, J), 
where i, the start node, is an element of Af and J, the set of 
end nodes, is a non-empty subset of N. Each hyperarc (i, J) 
represents a lossy broadcast link from node i to nodes in the 
non-empty set J. We denote by Zij the rate at which coded 
packets are injected into hyperarc and we denote by 

ZijK the rate at which packets, injected into hyperarc (i, J), 
are received by exactly the set of nodes K C J. Hence Zij := 
12 kcJ ^iJK- Let 


OiJK '■= - 

ZiJ 


The rate vector z, consisting of Zij, {i, J) £ A, is called a 
subgraph, and we assume that it must lie within a constraint 
set Z for, if not, the packet queues associated with one or more 
hyperarcs becomes unstable (for examples of constraint sets 
Z that pertain specifically to multi-hop wireless networks, see 
[42], [43], [44], [45], [46], [47]). We reasonably assume that Z 
is a convex subset of the positive orthant containing the origin. 
We associate with the network a cost function / (reflecting, 
for example, the average latency or energy consumption) that 
maps valid rate vectors to real numbers and that we seek to 
minimize. 


following optimization problem. 


minimize f{z) 
subject to z G Z, 

Zij>^xf]j, V( 7 ,J)gA 

e'e-s- e 

i G Af, t & T, 

xf]j >0, V (i, J) gA, j G J,t& T. 


t G T, 


„(*) 




(t) 

i ’ 


(16) 


A simplification of problem J16t can be made if we assume 
that, when nodes transmit in a lossless network, they reach all 
nodes in a certain area, with cost increasing as this area is in¬ 
creased. More precisely, suppose that we have separable cost, 
so f{z) = J2{i,j)eA Suppose further that each node 

i has Mi outgoing hyperarcs (*, (z, ..., (i, ) 

with C C • • • C . (We assume that there 

are no identical links, as duplicate links can effectively be 
treated as a single link.) Then, we assume that f^jO) (C) < 
f Ai) (C) < • • • < (C) for all C > 0 and nodes i. For 

{i,j) £ •= {(*) j)l(*) 'f) G A, J 3 j}, we introduce the 

variables 

Mi 


A*) — 


■■= E 




(t) 

MmJ 


Suppose we have a source node s wishing to transmit 
packets at a positive, real rate i? to a non-empty set of sink 
nodes T. Consider the following optimization problem; 


where m{i,j) is the unique m such that j G Jm \ Jm-i 
(we define := 0 for all z G Af for convenience). 
Now, problem ( I16> can be reformulated as the following 
optimization problem, which has substantially fewer variables. 


minimize f(z) 
subject to z G Z, 

ZijbijK > ^ xfjj, W {i,J) G A, K C J, t & T, 

]&K 

E E-a- E 

{J|(j, J)eyt} is J {3\U,i)^A,i&i} 

V z G A/", f G T, 

xfji >0, V (z, J) G A j G J, < G T. 

(15) 


Theorem 2: The vector z is part of a feasible solution for 
the optimization problem J15> if and only if there exists a 
network code that sets up a multicast connection in the wire¬ 
less network represented by hypergraph Ti, at rate arbitrarily 
close to R from source s to sinks in the set T and that injects 
packets at rate arbitrarily close to Zij on each hyperarc (z, J). 

Proof: The proof is much the same as that for Theorem[2 
But, instead of Theorem 1 of [1], we use Theorem 2 of [6]. 

■ 

In the lossless case, we have hiji^ = 1 for all non-empty 
K G J and = 0. Hence, problem (I15> simplifies to the 


minimize ^ fij{zij) 
ii,J)eA 
subject to z G Z, 

Mi 

E ^ E 


lit) 

tik ’ 


V z G A/", m = 1,..., Mi, t gT, 


(17) 


E 4^- E 


tJi , 


V z G Af, f G T, 
V(z,j)gA, fGT. 


Proposition 2: Suppose that f{z) = Yia,j)GA 
that (C) < f.jO) (C) < ■ • • < fijG) (C) for all C > 0 and 
nodes z. Then problem {HJl and problem ( fT7l are equivalent 
in the sense that they have the same optimal cost and z is part 
of an optimal solution for (dll if and only if it is part of an 
optimal solution for Clll- 

Proof: See Appendix dn ■ 

We see that, provided that {bijK} are constant, problems 
(O and J16t are of essentially the same form as problem 
m, albeit with possibly more linear constraints relating z and 
X, and, if we drop the constraint set Z and consider linear. 



8 


separable cost or convex, separable cost, then the decentralized 
algorithms discussed in Sections nTxi and EH can be applied 
with little modification. In the case of problem (I17> . the 
subgradient method of Section nTxi can be applied once we 
note that its Lagrangian dual. 


maximize 

tGT 

subject to V = s (i) , V f G A/", TO = 1,..., Mi, 

r J 'I'Jm 

tGT 

pf] >0, V (^, J)£A,t£T, 


where 




iJ, 




and 




(pW) := 


ihj) 

c 

{i,j)eA' \ 


min I 




is of the same form as (|4}. 


IV. Comparison with techniques in routed packet 

NETWORKS 

In this section, we report on the results of several simu¬ 
lations that we conducted to assess the performance of the 
proposed techniques. We begin with wireline networks. 

In routed wireline networks, the standard approach to es¬ 
tablishing minimum-cost multicast connections is to find the 
shortest tree rooted at the source that reaches all the sinks, 
which equates to solving the Steiner tree problem on directed 
graphs [10]. For coded networks, the analogous problem to 
finding the shortest tree is solving the linear optimization 
problem in the case where = -boo, which, being a linear 
optimization problem, admits a polynomial-time solution. By 
contrast, the Steiner tree problem on directed graphs is well- 
known to be NP-complete. Although tractable approximation 
algorithms exist for the Steiner tree problem on directed graphs 
(for example, [10], [11], [12]), the solutions thus obtained 
are suboptimal relative to minimum-cost multicast without 
coding, which in turn is suboptimal relative to when coding is 
used, since coding subsumes forwarding and replicating (for 
example, the optimal cost for a Steiner tree in the network in 
Figure [T(^ is 10, as opposed to 19/2). Thus, coding promises 
potentially significant cost improvements. 

We conducted simulations where we took graphs repre¬ 
senting various Internet Service Provider (ISP) networks and 
assessed the average total weight of random multicast connec¬ 
tions using, first, our proposed network-coding based solution 
and, second, routing over the tree given by the Directed 
Steiner Tree (DST) approximation algorithm described in [11]. 
The graphs, and their associated link weights, were obtained 
from the Rocketfuel project of the University of Washington 
[48]. The approximation algorithm in [11] was chosen for 
comparison as it achieves a poly-logarithmic approximation 
ratio (it achieves an approximation ratio of 0(log^ |T|), where 
|r| is the number of sink nodes), which is roughly as good 
as can be expected from any practical algorithm, since it 
has been shown that it is highly unlikely that there exists a 



Fig. 2. Average energy of a random 4-terminal multicast of unit rate i n a 30- 
node wireless network using the subgradient method of Section [l^^ Nodes 
were placed randomly within a 10 X 10 square with a radius of connectivity 
of 3. The energy required to transmit at rate z to a distance d was taken to be 
d^z. Source and sink nodes were selected according to a uniform distribution 
over all possible selections. 


polynomial-time algorithm that can achieve an approximation 
factor smaller than logarithmic [10]. The results of the simu¬ 
lations are tabulated in Table |I] We see that, depending on the 
network and the size of the multicast group, the average cost 
reduction ranges from 10% to 33%. Though these reductions 
are modest, it is important to keep in mind that our proposed 
solution easily accommodates decentralized operation. 

For wireless networks, one specific problem of interest is 
that of minimum-energy multicast (see, for example, [13], 
[49]). In this problem, we wish to achieve minimum-energy 
multicast in a lossless wireless network without explicit regard 
for throughput or bandwidth, so the constraint set Z can be 
dropped altogether. The cost function is linear and separable, 
namely, it is f{z) = OzJ'ZzJ, where a^j represents 

the energy required to transmit a packet to nodes in J from 
node i. Hence problem (O becomes a linear optimization 
problem with a polynomial number of constraints, which can 
therefore be solved in polynomial time. By contrast, the same 
problem using traditional routing-based approaches is NP- 
complete—in fact, the special case of broadcast in itself is 
NP-complete, a result shown in [49], [50]. The problem must 
therefore be addressed using polynomial-time heuristics such 
as the MIP algorithm proposed in [13]. 

We conducted simulations where we placed nodes ran¬ 
domly, according to a uniform distribution, in a 10 x 10 
square with a radius of connectivity of 3 and assessed the 
average total energy of random multicast connections using 
first, our proposed network-coding based solution and, second, 
the routing solution given by the MIP algorithm. The energy 
required to transmit at rate x to a distance d was taken to be 
d^z. The results of the simulations are tabulated in Table El We 
see that, depending on the size of the network and the size of 
the multicast group, the average energy reduction ranges from 
13% to 49%. These reductions are more substantial than those 
for the wireline simulations, but are still modest. Again, it is 
important to keep in mind that the proposed solution easily 
accommodates decentralized operation. 

We conducted simulations on our decentralized algorithms 
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Network 

Approach 

Average multicast cost 

2 sinks 

4 sinks 

8 sinks 

16 sinks 

Telstra (au) 

DST approximation 

17.0 

28.9 

41.7 

62.8 


Network coding 

13.5 

21.5 

32.8 

48.0 

Sprint (us) 

DST approximation 

30.2 

46.5 

71.6 

127.4 


Network coding 

22.3 

35.5 

56.4 

103.6 

Ebone (eu) 

DST approximation 

28.2 

43.0 

69.7 

115.3 


Network coding 

20.7 

32.4 

50.4 

77.8 

Tiscali (eu) 

DST approximation 

32.6 

49.9 

78.4 

121.7 


Network coding 

24.5 

37.7 

57.7 

81.7 

Exodus (us) 

DST approximation 

43.8 

62.7 

91.2 

116.0 


Network coding 

33.4 

49.1 

68.0 

92.9 

Abovenet (us) 

DST approximation 

27.2 

42.8 

67.3 

75.0 


Network coding 

21.8 

33.8 

60.0 

67.3 


TABLE I 

Average cost oe random multicast connections of unit rate for various approaches in graphs representing various ISP 
NETWORKS. The cost per unit rate on each arc is the link weight as assessed by the Rocketfuel project of the University of 
Washington [48], Source and sink nodes were selected according to a uniform distribution over all possible selections. 


Network size 

Approach 

Average multicast energy 

2 sinks 

4 sinks 

8 sinks 

16 sinks 

20 nodes 

MIP algorithm 

30.6 

33.8 

41.6 

47.4 


Network coding 

15.5 

23.3 

29.9 

38.1 

30 nodes 

MIP algorithm 

26.8 

31.9 

37.7 

43.3 


Network coding 

15.4 

21.7 

28.3 

37.8 

40 nodes 

MIP algorithm 

24.4 

29.3 

35.1 

42.3 


Network coding 

14.5 

20.6 

25.6 

30.5 

50 nodes 

MIP algorithm 

22.6 

27.3 

32.8 

37.3 


Network coding 

12.8 

17.7 

25.3 

30.3 


TABLE II 


Average energy of random multicast connections of unit rate for various approaches in random wireless networks of varying 
size. Nodes were placed randomly within a 10 x 10 square with a radius of connectivity of 3. The energy required to transmit at 

RATE Z TO A DISTANCE d WAS TAKEN TO BE SOURCE AND SINK NODES WERE SELECTED ACCORDING TO A UNIFORM DISTRIBUTION OVER ALL 

POSSIBLE selections. 



Fig. 3. Average energy of a random 4-terminal multicast of unit rate i n a 30- 
node wireless network using the primal-dual method of Section H^B] Nodes 
were placed randomly within a 10 X 10 square with a radius of connectivity 
of 3. The energy required to transmit at rate z to a distance d was taken to be 
d?e^. Source and sink nodes were selected according to a uniform distribution 
over all possible selections. 

for a network of 30 nodes and a multicast group of 4 terminals 
under the same set up. In Figure |2] we show the average 
behavior of the subgradient method of Section nTxi applied 
to problem J17t . The algorithm was run under two choices of 
step sizes and convex combination weights. The curve labeled 
“original primal recovery” refers to the case where the step 


sizes are given by 6*[n] = ® and the convex combination 

weights by p/ [n] = 1 jn. The curve labeled “modihed primal 
recovery” refers to the case where the step sizes are given by 
9[n] = ® and the convex combination weights by p/[n] = 

1/n, if n < 30, and /i;[n] = 1/30, if n > 30. The modified 
primal recovery rule was chosen as a heuristic to lessen the 
effect of poor primal solutions obtained in early iterations. 
For reference, the optimal cost of problem Clli is shown, as 
is the cost obtained by the MIP algorithm. We see that, for 
both choices of step sizes and convex combination weights, the 
cost after the hrst iteration is already lower than that from the 
MIP algorithm. Moreover, in fewer than 50 iterations, the cost 
using modihed primal recovery is within 5% of the optimal 
value. Thus, in a small number of iterations, the subgradient 
method yields signihcantly lower energy consumption than 
that obtained by the MIP algorithm, which is centralized. 

In Figure 0 we show the average behavior of the primal- 
dual method of Section ITl-Rl annlied to problem J16t . To make 
the cost strictly convex, the energy required to transmit at rate 
z to a distance d was taken to be d^e^. Recall that we do not 
necessarily have a feasible solution at each iteration. Thus, to 
compare the cost at the end of each iteration, we recover a 
feasible solution from the vector z'[m] as follows: We take 
the subgraph dehned by z'[m] and compute the maximum 
flow from source s to sinks in the set T. We then And any 
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subgraph of z' [m] that provides this maximum flow and scale 
the subgraph so obtained to provide the desired flow. The cost 
of the scaled subgraph is assumed to be the cost of the solution 
at the end of each iteration. We chose the step sizes as follows: 
afj^[m\ = a, = 20q:, and was chosen to be 

large. The algorithm was run under two choices of a. We see, 
from our results, that the value of a has to be carefully chosen. 
Larger values of a generally lead to more oscillatory behavior 
but faster convergence. 

Finally, we considered unicast in lossy wireless networks. 
We conducted simulations where nodes were again placed 
randomly according to a uniform distribution over a square 
region. The size of square was set to achieve unit node 
density. We considered a network where transmissions were 
subject to distance attenuation and Rayleigh fading, but not 
interference (owing to scheduling). So, when node i transmits, 
the signal-to-noise ratio (SNR) of the signal received at node j 
is where 7 is an exponentially-distributed random 

variable with unit mean and is the distance between 

node i and node j. We assumed that a packet transmitted by 
node i is successfully received by node j if the received SNR 
exceeds (}, i.e. ^d{i,j)~‘^ > (3, where /3 is a threshold that we 
took to be 1/4. If a packet is not successfully received, then 
it is completely lost. 

We considered five different approaches to wireless uni¬ 
cast; approaches (□-(S do not use network coding, while 
approaches 0 and 0 do: 

1) End-to-end retransmission; A path is chosen from 
source to sink, and packets are acknowledged by the 
sink, or destination node. If the acknowledgment for 
a packet is not received by the source, the packet is 
retransmitted. This represents the situation where relia¬ 
bility is provided by a retransmission scheme above the 
link layer, e.g., by the transport control protocol (TCP) 
at the transport layer, and no mechanism for reliability 
is present at the link layer. 

2) End-to-end coding: A path is chosen from source to 
sink, and an end-to-end forward error correction (FEC) 
code, such as a Reed-Solomon code, an LT code [51], 
or a Raptor code [52], is used to correct for packets lost 
between source and sink. 

3) Link-by-link retransmission: A path is chosen from 
source to sink, and automatic repeat request (ARQ) is 
used at the link layer to request the retransmission of 
packets lost on every link in the path. Thus, on every 
link, packets are acknowledged by the intended receiver 
and, if the acknowledgment for a packet is not received 
by the sender, the packet is retransmitted. 

4) Path coding: A path is chosen from source to sink, 
and every node on the path employs coding to correct 
for lost packets. The most straightforward way of doing 
this is for each node to use one of the FEC codes for 
end-to-end coding, decoding and re-encoding packets it 
receives. The main drawback of such an approach is 
delay. Every node on the path codes and decodes packets 
in a block. A way of overcoming this drawback is to 
use codes that operate in a more of a “convolutional” 



Fig. 4. Average number of transmissions required per packet using various 
wireless unicast approaches in random networks of varying size. Sources and 
sinks were chosen randomly according to a uniform distribution. 


manner, sending out coded packets formed from packets 
received thus far, without decoding. The random linear 
coding scheme from [6] is such a code. A variation, with 
lower complexity, is presented in [53]. 

5) Full coding: In this case, paths are eschewed altogether. 
Problem o is solved to And a subgraph, and the 
random linear coding scheme from [6] is used. This 
represents the limit of achievability provided that we 
are restricted from modifying the design of the physical 
layer and that we do not exploit the timing of packets 
to convey information. 

In all cases where acknowledgments are sent, acknowledg¬ 
ments are subject to loss in the same way that packets are and 
follow the same path. 

The average number of transmissions required per packet 
using the various approaches in random networks of varying 
size is shown in Figure 0 Paths or subgraphs were chosen 
in each random instance to minimize the total number of 
transmissions required, except in the cases of end-to-end 
retransmission and end-to-end coding, where they were chosen 
to minimize the number of transmissions required by the 
source node (the optimization to minimize the total number of 
transmissions in these cases cannot be done straightforwardly 
by a shortest path algorithm). We see that, while end-to- 
end coding and link-by-link retransmission already represent 
significant improvements on end-to-end retransmission, the 
network coding approaches represent more significant im¬ 
provements still. By a network size of nine nodes, full coding 
already improves on link-by-link retransmission by a factor of 
two. Moreover, as the network size grows, the performance of 
the various schemes diverges. Here, we discuss performance 
simply in terms of the number of transmissions required 
per packet; in some cases, e.g., congestion, the performance 
measure increases super-linearly in this quantity, and the 
performance improvement is even greater than that depicted 
in Figure |3 We see, at any rate, that the use of network 
coding promises significant improvements, particularly for 
large networks. 
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V. Dynamic multicast 

In many applications, membership of the multicast group 
changes in time, with nodes joining and leaving the group, 
rather than remaining constant for the duration of the con¬ 
nection, as we have thus far assumed. Under these dynamic 
conditions, we often cannot simply re-establish the connection 
with every membership change because doing so would cause 
an unacceptable disruption in the service being delivered to 
those nodes remaining in the group. A good example of an 
application where such issues arise is real-time media distri¬ 
bution. Thus, we desire to hnd minimum-cost time-varying 
subgraphs that can deliver continuous service to dynamic 
multicast groups. 

Although our objective is clear, our description of the prob¬ 
lem is currently vague. Indeed, one of the principal hurdles to 
tackling the problem of dynamic multicast lies in formulating 
the problem in such a way that it is suitable for analysis and 
addresses our objective. For routed networks, the problem is 
generally formulated as the dynamic Steiner tree problem, 
which was hrst proposed in [14]. Under this formulation, 
the focus is on worst-case behavior and modifications of the 
multicast tree are allowed only when nodes join or leave 
the multicast group. The formulation is adequate, but not 
compelling; indeed, there is no compelling reason for the 
restriction on when the multicast tree can be modified. 

In our formulation for coded networks, we draw some 
inspiration from [14], but we focus on expected behavior rather 
than worst-case behavior, and we do not restrict modifications 
of the multicast subgraph to when nodes join or leave the 
multicast tree. We focus on wireline networks for simplicity, 
though our considerations apply equally to wireless networks. 
We formulate the problem as follows. 

We employ a basic unit of time that is related to the 
time that it takes for changes in the multicast subgraph to 
settle. In particular, suppose that at a given time the multicast 
subgraph is z and that it is capable of supporting a multicast 
connection to sink nodes T. Then, in one unit time, we can 
change the multicast subgraph to z', which is capable of 
supporting a multicast connection to sink nodes T', without 
disrupting the service being delivered to T n T' provided that 
(componentwise) z > z' or z < z'. The interpretation of this 
assumption is that we allow, in one time unit, only for the 
subgraph to increase, meaning that any sink node receiving 
a particular stream will continue to receive it (albeit with 
possible changes in the code, depending on how the coding is 
implemented) and therefore facing no significant disruption to 
service; or for the subgraph to decrease, meaning that any sink 
node receiving a particular stream will be forced to reduce to 
a subset of that stream, but one that is sufficient to recover the 
source’s transmission provided that the sink node is in T', and 
therefore again facing no significant disruption to service. We 
do not allow for both operations to take place in a single unit 
of time (which would allow for arbitrary changes) because, 
in that case, sink nodes may face temporary disruptions to 
service when decreases to the multicast subgraph follow too 
closely to increases. 

As an example, consider the four node network shown in 



Fig. 5. A four node network. 

Figure |3 Suppose that s = 1 and that, at a given time, we 
have T = {2,4}. We support a multicast of unit rate with the 
subgraph 

{Zl 2 , Zi3, Z24, Z 34 ) = (1,0, 1,0). 

Now suppose that the group membership changes, and node 
2 leaves while node 3 joins, so T' = {3,4}. As a result, we 
decide that we wish to change to the subgraph 

{zi2, Zi3, Z24, Z34) = (0, 1, 0, 1). 

If we simply make the change naively in a single time unit, 
then node 4 may face a temporary disruption to its service as 
packets on (2,4) stop arriving and before packets on (3,4) 
start arriving. The assumption that we have made on allowed 
operations ensures that we must first increase the subgraph to 

{Zi 2 , Zi3, Z 24 , Z 34 ) = (1, 1, 1, 1), 

allow for the change to settle by waiting for one time unit, 
then decrease the subgraph to 

(zi2,213,224,234) = ( 0 , 1 , 0 , 1 ). 

With this series of operations, node 4 maintains continuous 
service throughout the subgraph change. 

We discretize the time axis into time intervals of a single 
time unit. We suppose that at the beginning of each time 
interval, we receive zero or more requests from sink nodes 
that are not currently part of the multicast group to join and 
zero or more requests from sink nodes that are currently part 
of the multicast group to leave. We model these join and 
leave requests as a discrete stochastic process and make the 
assumption that, once all the members of the multicast group 
leave, the connection is over and remains in that state forever. 
Let Tm denote the sink nodes in the multicast group at the 
end of time interval m. Then, we assume that 

lim Pr(r™ ^ 0|To = T) = 0 (18) 

m—>-oo 

for any initial multicast group T. A possible, simple model 
of join and leave requests is to model \Tm\ as a birth-death 
process with a single absorbing state at state 0, and to choose 
a node uniformly from A/"' \ T^, where J\f' := J\f \ {s}, at 
each birth and from at each death. 

Let z*-™^ be the multicast subgraph at the beginning of 
time interval m, which, by the assumptions made thus far. 
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means that it supports a multicast connection to sink nodes 
Tjn-i- Let Vm-i and Wm-i be the join and leave requests 
that arrive at the end of time interval m — 1, respectively. 
Hence, Vm-i C J\f' \ T„-i, Wm-i C T„-i, and = 
{Tm-i \ Wm-i) U Vm- 1 - We choose from and 

Tjn using the function so Tm), where 

^.(m+i) jjg jjj ^ particular constraint set U{z^'^\Tm)- 
To characterize the constraint set U{z,T), recall the op¬ 
timization problem for minimum-cost multicast in wireline 
packet networks developed in Section HU 


minimize 

f{z) 




subject to 

z ^ Z ^ 





Zij > xl'i’ > 0. 

V 

{i,j) G A, tGT, 

E 

At) 

E 

At) 







'iieN,t€T, 


(19) 


Therefore, it follows that we can write U (z, T) = C/+(z, T) U 
U-{z,T), where 


have T = {6,8}. We support a multicast of rate 2 with 
the two trees {(1, 3), (3,4), (4, 5), (5, 6), (5, 7), (7, 8)} and 
{(1, 2), (2, 6), (6, 8)}, each carrying unit rate. Now suppose 
that the group membership changes, and node 6 leaves while 
node 7 joins, so T' = {7, 8}. It is clear that static multicast to 
T' is possible using multiple multicast trees (we simply reflect 
the solution for T), but we cannot achieve multicast to T' by 
only adding edges to the two existing trees. Our only recourse 
at this stage is to abandon the existing trees and establish new 
ones, which causes a disruption to the service of node 8, or 
to slowly reconfigure the existing trees, which causes a delay 
before node 7 is actually joined to the group. 

Returning to the problem at hand, we see that our objective 
is to find a policy tt = {no, fj,i, ■ ■ ■,} that minimizes the cost 
function 




TM-l 


lim E 

M —^oo 


_7n—0 


where X 2 ^'\{(ts} characteristic function for 2^ \{0} (i-e. 
X 2 ^'\{ 0 }(^) = 1 if L 0, and X 2 ^'\{ 0 }(’f") = 0 if T = 0). 

We impose the assumption that we have separable con¬ 
straints and that Z{Af') ^ 0; that is, we assume that there 
exists a subgraph that supports broadcast. This assumption 
ensures that the constraint set U{z,T) is non-empty for all 
z £ Z and T C Af'. Thus, from condition dH, it follows that 
there exists at least one policy tt (namely, one that uses some 
fixed z G Z(N') until the multicast group is empty) such that 

J7r(z(°\To) < OO. 

It is now not difficult to see that we are dealing with an 
undiscounted, infinite-horizon dynamic programming problem 
(see, for example, [55, Chapter 3]), and we can apply the 
theory developed for such problems to our problem. So doing, 
we first note that the optimal cost function J* := minTr 
satisfies Bellman’s equation; namely, we have 


c7+(z,r) = {z'GZ(r)|z'>z}, 

U.{z,T) = [z' G Z{T)\z' < z}, 

and Z{T) is the feasible set of problem ( I19> for a given T; i.e. 
if we have the subgraph z at the beginning of a time interval, 
and we must go to a subgraph that supports multicast to T, 
then the allowable subgraphs are those that support multicast 
to T and either increase z (those in U+{z,T)) or decrease z 
(those in U-{z, T)). 

Note that, if we have separable constraints, then 
U(z^^\Tm) ^ 0 for all z^ G Z provided that Z(T„) ^ 0; 
that is, from any feasible subgraph at stage m, it is possible 
to go to a feasible subgraph at stage m + 1 provided that 
one exists for the multicast group Tm- But while this is the 
case for coded networks, it is not always the case for routed 
networks. Indeed, if multiple multicast trees are being used (as 
discussed in [54], for example), then it is definitely possible 
to find ourselves in a state where we cannot achieve multicast 
at stage m + 1 even though static multicast to Tm is possible 
using multiple multicast trees. 

As an example of this phenomenon, consider the net¬ 
work depicted in Figure Suppose that each arc is of 

unit capacity, that s = 1, and that, at a given time, we 


J*{z,T)= min {f{u)+E[J*{u,{T\V)UW)]} 

uGU(z,T) 

if T 7 ^ 0, and J*{z, T) = 0 if T = 0. Moreover, the optimal 
cost is achieved by the stationary policy tt = {n, .. .}, where 

n is given by 

p(z,T) = argmin {f{u) + E[J*{u, (T \ L) U IL)]} (20) 

ueU(z,T) 

if T 7 ^ 0, and /r(z, T) = 0 if T = 0. 

The fact that the optimal cost can be achieved by a stationary 
policy limits the space in which we need to search for optimal 
policies significantly, but we are still left with the difficulty 
that the state space is uncountably large; it is the space of 
all possible pairs (z,T), which is Z x 2^'. The size of the 
state space more or less eliminates the possibility of using 
techniques such as value iteration to obtain J*. 

On the other hand, given J*, it does not seem at all 
implausible that we can compute the optimal decision at 
the beginning of each time interval using (|20j. Indeed, the 
constraint set is the union of two polyhedra, which can be 
handled by optimizing over each separately, and, although the 
objective function may not necessarily be convex even if / is 
convex owing to the term E[J*(u, (T \ L) U W)], we are, at 
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any rate, unable to obtain J* precisely on account of the large 
state space, and can restrict our attention to approximations 
that make problem J20> tractable. 

For dynamic programming problems, there are many ap¬ 
proximations that have been developed to cope with large state 
spaces (see, for example, [55, Section 2.3.3]). In particular, we 
can approximate J*{z, T) by J{z^ T, r), where J(z, T, r) is of 
some fixed form, and r is a parameter vector that is determined 
by some form of optimization, which can be performed offline 
if the graph Q is static. Depending upon the approximation 
that is used, we may even be able to solve problem J20> using 
the decentralized algorithms described in Section|II](or simple 
modifications thereof). The specific approximations J{z,T, r) 
that we can use and their performance are beyond the scope 
of this paper. 


VI. Conclusion 

Routing is certainly a satisfactory way to operate packet 
networks. It clearly works, but it is not clear that it should 
be used for all types of networks. As we have mentioned, 
application-layer overlay networks and multi-hop wireless 
networks are two types of networks where coding is a definite 
alternative. 

To actually use coding, however, we must apply to coding 
the same considerations that we normally apply to routing. 
This paper did exactly that; We took the cost consideration 
from routed packet networks and applied it to coded packet 
networks. More specifically, we considered the problem of 
finding minimum-cost subgraphs to support multicast connec¬ 
tions over coded packet networks—^both wireline and wireless. 
As we saw, this problem is effectively decoupled from the 
coding problem: To establish minimum-cost multicast connec¬ 
tions, we can first determine the rate to inject coded packets 
on each arc, then determine the contents of those packets. 

Our work therefore brings coded packet networks one step 
closer to realization. But, to actually see that happen, much 
work remains to be done. For example, designing protocols 
around our algorithms is a clear task, as is designing protocols 
to implement coding schemes. In addition, there are some 
important issues coming directly from this paper that require 
further exploration. Some of these relate to the decentralized 
algorithms, e.g., their stability under changing conditions (e.g., 
changing arc costs, changing graph topology), their speeds 
of convergence, their demands on computation and message- 
exchange, and their behavior under asynchronism. Another 
topic to explore is specific approximation methods for use in 
our formulation of dynamic multicast. 

On a broader level, we could design other algorithms using 
the flow formulations given in this paper (see [56], [57]). And 
we could give more thought to the cost functions themselves. 
Where do they come from? Do cost functions for routed 
packet networks make sense for coded ones? If a coded packet 
network is priced, how should the pricing be done? And how 
should the resultant cost be shared among the members of the 
multicast group? 

In short, we believe that realizing coded packet networks 
is a worthwhile goal, and we see our work as an integral 


step toward this goal. Much promising work, requiring various 
expertise, remains. 
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Appendix I 

We wish to solve the following problem, 
minimize 

tGT 

subject to w S Pij, 

where Pij is the |T|-dimensional simplex 

= Qij, V >0 

tGT 

First, since the objective function and the constraint set Pij are 
both convex, it is straightforward to establish that a necessary 
and sufficient condition for global optimality of in Pij is 

{)(*) > 0 ^ V r e T (21) 



(see, for example, [32, Section 2.1]). Suppose we index the 
elements of T such that We 

then note that there must be an index k in the set {1,..., |T|} 
such that > 0 for I = 1,..., A: and = 0 for I > 
k + 1, for, if not, then a feasible solution with lower cost can 
be obtained by swapping around components of the vector. 
Therefore, condition (ED implies that there must exist some 
d such that -f d for all t G {ti,... ,tk} and that 

d < for all t G {tk+i, ■ ■ •, t\T\}, which is equivalent to 

d < Since is in the simplex Pij, it follows that 

tk 

kd + = aij, 

t=i 


which gives 



By taking k = k, where k is the smallest k such that 


1 

k 





r—1 




(or, if no such k exists, then k = |T|), we see that we have 



> 


which can be rearranged to give 


d 



t=i / 
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Hence, if is given by 


z;(*) = 



7 2^r-=l 


,(r) 


if t € 
otherwise, 


( 22 ) 


then is feasible and we see that the optimality condition 
(12U is satisfied. Note that, since d < equation (I22> 

can also be written as 


as A: ^ oo; hence we see from equations Il24t and ( I25> that, 
for k sufficiently large, 

n 

Zij[n] = - ^ JinCij[n] 

and, therefore, that complementary slackness with p holds in 
the limit of any convergent subsequence of {a;[n]}. 


V 


= max 



1 

k 



(23) 


We now turn to showing that any accumulation point of the 
sequence of primal iterates {a:[n]} given by (0 is an optimal 
solution the primal problem (EJ. Suppose that the dual feasible 
solution that the subgradient method converges to is p. Then 
there exists some m such that for n > m 

Pif + 1] = N + 0[n]x\f [n] + c^- [n] 

for all {i,j) S A and t G T such that pf^^ > 0. Therefore, if 
pfj > 0, then for n > m we have 

771 n 

xfj [n] [”] w + 

1—1 

m 

= '^Hi[n]x\f W 

+ ^ ^^ip\f[n+^-p[f[n]-d,j[n]) 

l=m+l 

m n 

= '^Pi M x\f [Z] + An {p[f [n + 1] - plf [n] ) 

1 — 1 /— m +1 

n 

- ^ lindijln]. 

l—m-\-l 

(24) 


Otherwise, if = 0, then from equation ( I23t . we have 


[n + 1] > p\]’ [n] + 0[n]x'f,> [n] + [n], 


n(‘) 


(t)r 


SO 

m n 

if- [n] <'^ Pi Mx'ff [Z] + Yf An {pif [« + !]- pff [n]) 

1 — 1 /—m+1 

n 

- Y AnCij[n]. 
l—m-\-l 

(25) 

It is straightforward to see that the sequence of iterates 
{i[n]} is primal feasible, and that we obtain a primal feasible 
sequence {z[n]} by setting Zij[n] := ma-x.t^Txff[n]. Sherali 
and Choi [34] showed that, if the required conditions on the 
step sizes {0[n]} and convex combination weights {pi[n]] are 
satished, then 

m n 

Y [”] ^ff W + Yf An {pff [n+l]- pff [n]) ^ 0 

/=1 l—m-\-l 


Appendix II 

Proof of Proposition^ 

We prove the stability of the primal-dual algorithm by using 
the theory of Lyapunov stability (see, for example, [41, Section 
3.10]). This proof is based on the proof of Theorem 3.7 of [41]. 
The Lagrangian for problem (doll is as follows: 


L{x,p, A) = U{x) 

E A- E A 
E ™ 

{id)eA ) 

The function U is strictly concave since is a monotonically 
increasing, strictly convex function and zL is a strictly convex 
function of Xij, so there exists a unique minimizing solution 
for problem (HOj, say X, and Lagrange multipliers, say p and A, 
which satisfy the following Karush-Kuhn-Tucker conditions. 


dL{x,p, A) 


dx. 


(t) 



E 


iff > 0 

A>+0 

Xffxij = 0 


E ^ 

iPUA&A} 


V (i,j) GA,tGT, 


M iGNAGT, 

V (*,j) GA,tGT, 

V (i,j) GA,tGT, 

V {i,j) eA,tGT. 


(27) 


(28) 

(29) 

(30) 

(31) 


From Equation ( I26t . it can be verified that {x,p,X) is an 
equilibrium point of the primal-dual algorithm. We now prove 
that this point is globally, asymptotically stable. 

Consider the following function as a candidate for the 
Lyapunov function: 


V{x,p,X) 


A‘) 


= E E 

teT I (ij)GA 


kff{cr) 


{a-xff)da 


tj "'ij 


At) 


(7 - Xff)dl 


IxA rn[f{j) 


„(*) 


E 

i&N "'P 


1 






{P-pffd(3 
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Note that V{x,p, X) = 0. Since, kfHa) > 0, if 7 ^ xXa\ 


Xt) -I dt) 


.(*) 


we have / y 771777 - x'fj’)dcr > 0. This argument can 


1 

k^ia) ^ y 

be extended to the other terms as well. Thus, whenever 
{x,p,X) ^ {x,p, X), we have V{x,p,X) > 0. 

Now, 


V = 


E E 

ter I (ij)eA 


a(‘) _ \(‘) 




fdU(x) (t) A (t)' 


- yh 


i&N ) 


Note that 


= 0 and, since > 0 , - Xf/) > 0 . 

Therefore^, 








V < 


eI i: [ 


_ \(t)) 

y'^ij '^ij ' 


dUjx) _ (t) , (i) 

. 54 ^ ^ E 




+E(E-E)(E-#)| 

iGN ) 

= {q- q)'{x -x) + {p- p)'{y - y) 


si E [- 

*GT l^(i.i)GA 




+ 


dU{x) (t) 


- qiy + X\f ■ {x[f - x[f) 


dx 


(t) 


n 


+E(E -pf) 


ieN 


If the initial choice of A is such that A(0) > 0, we see 
from the primal-dual algorithm that A(r) > 0. This is true 
since A > 0 whenever A < 0. Thus, it follows by the theory 
of Lyapunov stability that the algorithm is indeed globally, 
asymptotically stable. 

Appendix III 
Proof of Proposition|2] 

Suppose (x, z) is a feasible solution to problem (I16t . Then, 
for all (j,j) G A' and t & T, 


Mi 

E 


Mi 




it) 

X (i) 
ijX’k 




since the inequality is an equality if either xA < 0 or A^^- > 
0; and, in the case when 0 and A,-*^ < 0, we have 

‘■J ^ ^ 


^ E E^ 

Mi 

= E E 

Mi 


m ^ 


> 


E 


E 


keJM.\jAi .,1 m=max(m(i,j),m(i,k)) 


^ijy k 


Mi 


E E ^ 

jM \ tC'*') m—mii.k) 




E 


Xt) 




Hence (x, z) is a feasible solution of problem o with the 
same cost. 

Now suppose (x, z) is an optimal solution of problem O- 
E«(C) < < •■• < fijMC) for all C > 0 

and i G Xf hy assumption, it follows that, for all i G N, the 
sequence z. ,(«>, z. .(i) ,,z. ,(t) is given recursively, starting 

iJl lJ2 ^'^Mi 

from m = Mi, by 


Mi 


^^JA> = Tt 1 E ^ik - E ^ ■ 

Hence z. ,(t) > 0 for aW. i G N and m = 1,2,..., Mi. We 

^'Jm 

then set, starting from m = Mi and j G J^., 

/ Mi 


= {\/U{x) - sjU{x)Y{x - x) - A'x, 

where the last line follows from Karush-Kuhn-Tucker condi¬ 
tions (|23-(|3l} and the fact that 


p'y = EE4M E E" E 

t&T ieAf \{il(*j)e^} {ilO',i)G.A} 

=E E -!?(?;•’ 

i6T {i,j)eA 

Thus, owing to the strict concavity of U{x), we have V < 
—A'x, with equality if and only if x = x. So it follows that 
< 0 for all A > 0, since x > 0. 


(t) . .(t) 

X := mm x 




- E Xij(i)i,Z,j(i) 




E 


Mi 




It is now difficult to see that (x, z) is a feasible solution of 
problem (d with the same cost. 

Therefore, the optimal costs of problems d and d 
are the same and, since the objective functions for the two 
problems are the same, z is part of an optimal solution for 
problem J16> if and only if it is part of an optimal solution 
for problem d- 
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